Learning Dow Jones From Twitter Sentiment

نویسندگان

  • Benjamin Au
  • Zhang
چکیده

In 2010, Bollen used Twitter data to find high predictability of Twitter sentiment on the stock market. [1]. We hypothesized that while Bollen’s results from analyzing the full breadth of the Twitter pipeline found significant results, fine-tuning the Twitter pipeline to only ‘high-impact’ financial tweets would improve the data signal and further improve results. As a result, we filtered a dataset of the Twitter pipeline for high impact tweets by user and financially-related ‘short list’ keywords and applied sentiment analysis on this filtered data and combined this in conjunction with DJIA stock market outcomes. Analyzing this data using logistic regression, SVM and time series analysis, we found modest outcomes, with predictability, peaking at 62.13%. While our filtered approach did not reach the levels claimed by Bollen, we showed substantial results in showing that applying appropriate pre-filtering on Twitter data is necessary in running future analysis on the predictive power of the Twitter pipeline to maximize the sentiment signal of Twitter data. 1. Introduction In behavioral economics, market outcomes are affected by the sentiment of market agents themselves. Twitter, a platform publishing over 400 million tweets per day, seems to be a treasure trove of big data to mine to find an appropriate proxy for market sentiment. Bollen achieved impressive results from the sentiment of the Twitter pipeline on the Dow Jones Industrial Average in 6 dimensions. Motivated by these promising results, we suspected we could augment Bollen’s analysis in a few ways. 1) We would filter the Twitter data to use only high-impact financially related tweets. Preliminary analysis of Tweet dataset quickly found that the vast majority of tweets were inane and utterly unrelated to the stock market. We suspected that proper filtering of Twitter data would reduce the risk of a ‘garbage in-garbage out’ lowsignal-to-noise dataset. 2) We would apply SVM techniques to the filtered data to find an efficient decision boundary condition. 3) We would apply time series methods in our prediction. We were motivated by the potential of combining these tools to build on the results of Bollen. 2. Data 2.1 Source Our dataset consisted of two sources. First, we have a slice of the full twitter pipeline, from June 11 to December 31, 2009, which consisted of about 476 million tweets [2]. Each tweets consisted of timestamp, username and tweet content. Our second source was daily closing prices of the Dow Jones Industrial Average for the same time duration as our Twitter dataset [3]. 2.2 Preprocessing Motivated by our desire to improve the signal of our dataset we pre-filtered our dataset for high-impact content as well as high-impact users. We generated a list of 131 high-impact finance-related Twitter users [4][5] and filtered twitter content for only those high-impact users. Individuals on this short list would likely tweet content finance-related and they would better approximate the sentiment of the stock market. Secondly, we filtered our dataset based on 20 high-impact finance-related keywords picked by ourselves. Filtering for these high-impact keywords would increase the signal of our dataset by obtaining only tweets related to the stock market in content, which in preliminary analysis, consisted of a small minority of overall tweets. Further preprocessing techniques were performed to scrub tweet content, including making all content lowercase, removing all tweets that were not in English. Using these preprocessing methods, we obtained a cleaner dataset with far less noise than the original. Given our original dataset of about 476 million tweets, our filtering did not pose a risk on overall sample size. 2.3. Sentiment Analysis To obtain sentiment for each tweet in our filtered dataset, we used a preconstructed Twitter Sentiment Analysis word list by Alex Davies to obtain dimensions of “happiness” and “sadness” of each tweet token. Overall sentiment for each tweet was taken based on averaging the sentiment for all applicable tokens in the sentiment word list. We would use these sentiment statistics as the basis of our sentiment analysis. 3. Machine Learning Models Our goal was to use Machine Learning techniques to use sentiment data of a given day predict a binary change (positive or negative) on the DJIA closing price of the following day. Given our DJIA and tweet sentiment data, we performed several machine learning analyses, including logistic regression, SVM with linear, radial and sigmoid kernels, and applying time series analysis techniques in including previous day DJIA changes. We applied a few cross-validation techniques to train our algorithm, including 10-fold, 20-fold and Leave-one-out cross validation. Results are below. 4.1 shows results for machine learning analysis on tweets pre-filtered for highimpact Twitter users. 4.2 shows results for similar analysis for high-impact tweet content. 4.3 shows results for mixing both high-impact user and high-impact keyword content techniques. 4. Results and Discussion Notations for this section: • Time unit: day • t – today, t+1 – tomorrow, t-1 – yesterday, etc. • Happy_U, Sad_U represents sentiment value generated from high impact users • Happy_W, Sad_W represents sentiment value generated from high impact tweet content • LIBLINEAR and LIBSVM are SVM libraries • For LIBSVM, I omitted the results of 10-fold and 20-fold Cross Validation and only kept the LOOCV results Upon performing logistic analysis on a variety of flavors of sentiment and outcome-based models, we found that for high-impact user models, high-impact keyword content models and for combined models, the best sentiment model was Model 4.3d, which predicts DJIA(t) based on independent variables Happy_U(t), Sad_U(t), Happy_W(t), Sad_W(t), Happy_U(t-1), Sad_U(t-1), Happy_W(t-1), Sad_W(t-1), Happy_U(t-2), Sad_U(t-2), Happy_W(t-2), and Sad_W(t-2). Using this model in conjunction with SVM and Leave-One-Out Cross Validation, we achieved our greatest predictive power: 62.32% for high-impact user models, for this combined model. This implies that applying time series instruments on previous day close DJIA close and sentiment is significant in boosting next day predictive power. See figures below for detailed results. All models in each model type performed similar with ranges of no greater than 3% in performance. High-impact user model performed on the whole better than high-impact keyword models, with a mix of the two performing better than either or in isolation. In addition, because our data for the time period (140 trading days) is scarce, we just focused on the LOOCV result. 4.1 High-Impact User Results The following tables show the results from using High-Impact User tweet filtering technique and applying logistic regression and SVM with linear, sigmoid and radial kernels on the resulting sentiment. We used a variety of flavors of happy/sad/DJIAprevious outcome to model DJIA outcome, and we collected results from 6 different models with different lags. The best model was Model 3: DJIA(t+1) ~ Happy_U(t) + Sad_U(t) + Happy_U(t-1) + Sad_U(t-1) + DJIA(t), using SVM with a linear kernel, which achieved predictive power of 59.7122%. The results of our basic model and best model are listed below: (Basic) Model 1: DJIA (t+1) ~ Happy_U(t) + Sad_U(t) LIBLINEAR L2 Logistic L1 Logistic 10-­‐fold CV 20-­‐fold CV LOOCV 55.7143% 55.7143% 57.1429% 57.1429% 56.4286% LIBSVM (LOOCV) Linear Kernel Radial Kernel Sigmoid Kernel 55.7143% 53.5714% 55.7143% Figure 4.1a Results from Logistic Regression and SVM on High Impact User Model: DJIA(t+1) ~ Happy_U(t) + Sad_U(t) Model 3: DJIA (t+1) ~ Happy_U(t) + Sad_U(t) + Happy_U(t-­‐1) + Sad_U(t-­‐1) + label(t) Logistic Regression 55.3957% LIBSVM (LOOCV) Linear Kernel Radial Kernel Sigmoid Kernel 55.3957% 56.1151% 55.3957% LIBLINEAR 10-­‐fold CV 20-­‐fold CV LOOCV 57.5540% 59.7122% 58.9928% Figure 4.1c Results from Logistic Regression and SVM on Time-Series Sentiment and Outcome on High Impact User Model: DJIA(t+1) ~ Happy_U(t) + Sad_U(t) + Happy_U(t-1) + Sad_U(t-1) + DJIA(t) 4.2 High-Impact Tweet Content Results With the same idea as in 4.1, we obtain results from filtering tweets by high-impact content keywords. We performed similar analyses using Logistic Regression and SVM with linear, radial and sigmoid kernels; along with 10-fold, 20-fold and Leaveone-out cross validation techniques. The best results came from Model 4: DJIA(t+1) ~ Happy_W(t) + Sad_W(t) + Happy_W(t-1) + Sad_W(t-1) + Happy_W(t-2) + Sad_W(t-2), using SVM with a linear kernel and cross validation. The results of models based on data filtered by high-impact tweet content are not as significant as those in 4.1. This achieved predictive power of is 55.7971%. The results of our best model is listed below: Model 4: DJIA (t+1) ~ Happy_W(t) + Sad_W(t) + Happy_W(t-­‐1) + Sad_W(t-­‐1) + Happy_W(t-­‐2) + Sad_W(t-­‐2) Logistic Regression 56.5217% LIBSVM (LOOCV) Linear Kernel Radial Kernel Sigmoid Kernel 55.7971% 55.7971% 55.7971% LIBLINEAR 10-­‐fold CV 20-­‐fold CV LOOCV 54.3478% 55.7971% 55.7971% Figure 4.2d Results from Logistic Regression and SVM on Extended Time-Series Sentiment on High-Impact Content Model: DJIA(t+1) ~ Happy_W(t) + Sad_W(t) + Happy_W(t1) + Sad_W(t-1) + Happy_W(t-2) + Sad_W(t-2) 4.3 Combined User/Content Model Results Finally, we combined the high-impact user and high-impact keyword content models from 4.1 and 4.2, using the same modeling techniques. The best results again came from time series Model 4: DJIA(t+1) ~ Happy_U(t) + Sad_U(t) + Happy_W(t) + Sad_W(t) + Happy_U(t-1) + Sad_U(t-1) + Happy_W(t-1) + Sad_W(t-1) + Happy_U(t-2) + Sad_U(t-2) + Happy_W(t-2) + Sad_W(t-2). This model achieved an accuracy of 62.3188%. The best result is listed below: Model 4.3d: DJIA(t+1) ~ Happy_U(t) + Sad_U(t) + Happy_W(t) + Sad_W(t) + Happy_U(t-­‐1) + Sad_U(t-­‐1) + Happy_W(t-­‐1) + Sad_W(t-­‐ 1) + Happy_U(t-­‐2) + Sad_U(t-­‐2) + Happy_W(t-­‐2) + Sad_W(t-­‐2) Logistic Regression 57.2464% LIBSVM (LOOCV) Linear Kernel Radial Kernel Sigmoid Kernel 61.5942% 62.3188% 56.5217% LIBLINEAR 10-­‐fold CV 20-­‐fold CV LOOCV 60.8696% 61.5942% 60.8696% Figure 4.3d Results from Logistic Regression and SVM on Extended Time-Series Sentiment on High-Impact Content Model: DJIA(t+1) ~ Happy_U(t) + Sad_U(t) + Happy_W(t) + Sad_W(t) + Happy_U(t-1) + Sad_U(t-1) + Happy_W(t-1) + Sad_W(t-1) + Happy_U(t-2) + Sad_U(t-2) + Happy_W(t-2) + Sad_W(t-2)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Twitter Sentiment Analysis Applied to Finance: A Case Study in the Retail Industry

This paper presents a financial analysis over Twitter sentiment analytics extracted from listed retail brands. We investigate whether there is statistically-significant information between the Twitter sentiment and volume, and stock returns and volatility. Traditional newswires are also considered as a proxy for the market sentiment for comparative purpose. The results suggest that social media...

متن کامل

The Effects of Twitter Sentiment on Stock Price Returns

Social media are increasingly reflecting and influencing behavior of other complex systems. In this paper we investigate the relations between a well-known micro-blogging platform Twitter and financial markets. In particular, we consider, in a period of 15 months, the Twitter volume and sentiment about the 30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We find a rel...

متن کامل

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Twitter sentiment around the Earnings Announcement events

We investigate the relationship between social media, Twitter in particular, and stock market. We provide an in-depth analysis of the Twitter volume and sentiment about the 30 companies in the Dow Jones Industrial Average index, over a period of three years. We focus on Earnings Announcements and show that there is a considerable difference with respect to when the announcements are made: befor...

متن کامل

Predicting Financial Markets: Comparing Survey,News, Twitter and Search Engine Data

Financial market prediction on the basis of online sentiment tracking has drawn a lot of attention recently. However, most results in this emerging domain rely on a unique, particular combination of data sets and sentiment tracking tools. This makes it difficult to disambiguate measurement and instrument effects from factors that are actually involved in the apparent relation between online sen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013